AITopics | indian institute

Collaborating Authors

indian institute

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Explainable Detection of AI-Generated Images with Artifact Localization Using Faster-Than-Lies and Vision-Language Models for Edge Devices

Mathur, Aryan, Ahmed, Asaduddin, Vasoya, Pushti Amit, Sonar, Simeon Kandan, Z, Yasir, Kuppusamy, Madesh

arXiv.org Artificial IntelligenceOct-29-2025

The increasing realism of AI-generated imagery poses challenges for verifying visual authenticity. We present an explainable image authenticity detection system that combines a lightweight convolutional classifier ("Faster-Than-Lies") with a Vision-Language Model (Qwen2-VL-7B) to classify, localize, and explain artifacts in 32x32 images. Our model achieves 96.5% accuracy on the extended CiFAKE dataset augmented with adversarial perturbations and maintains an inference time of 175ms on 8-core CPUs, enabling deployment on local or edge devices. Using autoencoder-based reconstruction error maps, we generate artifact localization heatmaps, which enhance interpretability for both humans and the VLM. We further categorize 70 visual artifact types into eight semantic groups and demonstrate explainable text generation for each detected anomaly. This work highlights the feasibility of combining visual and linguistic reasoning for interpretable authenticity detection in low-resolution imagery and outlines potential cross-domain applications in forensics, industrial inspection, and social media moderation.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2510.23775

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.47)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Physics-guided Emulators Reveal Resilience and Fragility under Operational Latencies and Outages

Dubey, Sarth, Ghosh, Subimal, Bhatia, Udit

arXiv.org Artificial IntelligenceOct-22-2025

Reliable hydrologic and flood forecasting requires models that remain stable when input data are delayed, missing, or inconsistent. However, most advances in rainfall-runoff prediction have been evaluated under ideal data conditions, emphasizing accuracy rather than operational resilience. Here, we develop an operationally ready emulator of the Global Flood Awareness System (GloFAS) that couples long-and short-term memory networks with a relaxed water-balance constraint to preserve physical coherence. Five architectures span a continuum of information availability: from complete historical and forecast forcings to scenarios with data latency and outages, allowing systematic evaluation of robustness. Trained in minimally managed catchments across the United States and tested in more than 5,000 basins, including heavily regulated rivers in India, the emulator reproduces the hydrological core of GloFAS and degrades smoothly as information quality declines. The framework establishes operational robustness as a measurable property of hydrological machine learning and advances the design of reliable real-time forecasting systems. Catchment response to precipitation varies in space and time with climate, storage dynamics, and human regulation, making reliable prediction dependent on both data availability and model adaptability [3, 4]. Although advances in observations, reanalysis products, and computational methods have expanded predictive capability [5-9], translating this progress into forecasting systems that operate continuously and robustly in real time remains unresolved. Operational forecasting requires models that sustain accuracy and physical realism when input data are asynchronous, incomplete, or inconsistent with the conditions used for training, and that can do so with limited human intervention [10-12].

artificial intelligence, machine learning, real time system, (21 more...)

arXiv.org Artificial Intelligence

2510.18535

Country:

North America > United States (0.66)
Asia > India > Gujarat (0.15)
Asia > India > Maharashtra (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Government > Regional Government (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Architecture > Real Time Systems (0.88)

Add feedback

GroMo: Plant Growth Modeling with Multiview Images

Bhatt, Ruchi, Bansal, Shreya, Chander, Amanpreet, Kaur, Rupinder, Singh, Malya, Kankanhalli, Mohan, Saddik, Abdulmotaleb El, Saini, Mukesh Kumar

arXiv.org Artificial IntelligenceMar-9-2025

Understanding plant growth dynamics is essential for applications in agriculture and plant phenotyping. We present the Growth Modelling (GroMo) challenge, which is designed for two primary tasks: (1) plant age prediction and (2) leaf count estimation, both essential for crop monitoring and precision agriculture. For this challenge, we introduce GroMo25, a dataset with images of four crops: radish, okra, wheat, and mustard. Each crop consists of multiple plants (p1, p2, ..., pn) captured over different days (d1, d2, ..., dm) and categorized into five levels (L1, L2, L3, L4, L5). Each plant is captured from 24 different angles with a 15-degree gap between images. Participants are required to perform both tasks for all four crops with these multiview images. We proposed a Multiview Vision Transformer (MVVT) model for the GroMo challenge and evaluated the crop-wise performance on GroMo25. MVVT reports an average MAE of 7.74 for age prediction and an MAE of 5.52 for leaf count. The GroMo Challenge aims to advance plant phenotyping research by encouraging innovative solutions for tracking and predicting plant growth. The GitHub repository is publicly available at https://github.com/mriglab/GroMo-Plant-Growth-Modeling-with-Multiview-Images.

age prediction, dataset, leaf counting, (12 more...)

arXiv.org Artificial Intelligence

2503.06608

Country:

Asia > India (0.06)
Asia > Singapore (0.05)
Asia > Middle East > UAE (0.04)

Genre: Research Report (0.84)

Industry: Food & Agriculture > Agriculture (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Sensing and Signal Processing > Image Processing (0.68)
Information Technology > Artificial Intelligence > Vision (0.68)

Add feedback

Everyday Speech in the Indian Subcontinent

Pathak, Utkarsh, Gunda, Chandra Sai Krishna, Sathiyamoorthy, Sujitha, Agarwal, Keshav, Murthy, Hema A.

arXiv.org Artificial IntelligenceOct-14-2024

India has 1369 languages of which 22 are official. About 13 different scripts are used to represent these languages. A Common Label Set (CLS) was developed based on phonetics to address the issue of large vocabulary of units required in the End to End (E2E) framework for multilingual synthesis. This reduced the footprint of the synthesizer and also enabled fast adaptation to new languages which had similar phonotactics, provided language scripts belonged to the same family. In this paper, we provide new insights into speech synthesis, where the script belongs to one family, while the phonotactics comes from another. Indian language text is first converted to CLS, and then a synthesizer that matches the phonotactics of the language is used. Quality akin to that of a native speaker is obtained for Sanskrit and Konkani with zero adaptation data, using Kannada and Marathi synthesizers respectively. Further, this approach also lends itself seamless code switching across 13 Indian languages and English in a given native speaker's voice.

artificial intelligence, machine learning, synthesis, (15 more...)

arXiv.org Artificial Intelligence

2410.10508

Country:

Asia > India > Tamil Nadu > Chennai (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.54)

Add feedback

G$^{2}$TR: Generalized Grounded Temporal Reasoning for Robot Instruction Following by Combining Large Pre-trained Models

Arora, Riya, Narendranath, Niveditha, Tambi, Aman, Zachariah, Sandeep S., Chakraborty, Souvik, Paul, Rohan

arXiv.org Artificial IntelligenceOct-9-2024

Consider the scenario where a human cleans a table and a robot observing the scene is instructed with the task "Remove the cloth using which I wiped the table". Instruction following with temporal reasoning requires the robot to identify the relevant past object interaction, ground the object of interest in the present scene, and execute the task according to the human's instruction. Directly grounding utterances referencing past interactions to grounded objects is challenging due to the multi-hop nature of references to past interactions and large space of object groundings in a video stream observing the robot's workspace. Our key insight is to factor the temporal reasoning task as (i) estimating the video interval associated with event reference, (ii) performing spatial reasoning over the interaction frames to infer the intended object (iii) semantically track the object's location till the current scene to enable future robot interactions. Our approach leverages existing large pre-trained models (which possess inherent generalization capabilities) and combines them appropriately for temporal grounding tasks. Evaluation on a video-language corpus acquired with a robot manipulator displaying rich temporal interactions in spatially-complex scenes displays an average accuracy of 70.10%. The dataset, code, and videos are available at https://reail-iitdelhi.github.io/temporalreasoning.github.io/ .

arxiv preprint arxiv, interaction, reasoning, (10 more...)

arXiv.org Artificial Intelligence

2410.07494

Country: North America > Canada > British Columbia (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Temporal Reasoning (0.86)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)

Add feedback

Interview with AAAI Fellow Anima Anandkumar: Neural Operators for science and engineering problems

AIHubAug-20-2024, 09:50:33 GMT

Each year the Association for the Advancement of Artificial Intelligence (AAAI) recognizes a group of individuals who have made significant, sustained contributions to the field of artificial intelligence by appointing them as Fellows. We've been talking to some of the 2024 AAAI Fellows to find out more about their research. In this interview, we meet Anima Anandkumar and find out about her work on Neural Operators, of which she is the inventor. Neural Operators are able to learn complex physical phenomena that occur at multiple resolutions while standard neural networks are unable to do so. Standard neural networks use a fixed number of pixels or resolution to learn a phenomenon, while neural operators represent data as continuous functions.

aaai fellow anima anandkumar, neural operator, science and engineering problem, (12 more...)

AIHub

Genre: Personal (0.70)

Industry:

Information Technology (0.51)
Health & Medicine (0.37)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.49)

Add feedback

A Hybrid-Layered System for Image-Guided Navigation and Robot Assisted Spine Surgery

T, Suhail Ansari, Maik, Vivek, Naheem, Minhas, Ram, Keerthi, Lakshmanan, Manojkumar, Sivaprakasam, Mohanasankar

arXiv.org Artificial IntelligenceJun-7-2024

In response to the growing demand for precise and affordable solutions for Image-Guided Spine Surgery (IGSS), this paper presents a comprehensive development of a Robot-Assisted and Navigation-Guided IGSS System. The endeavor involves integrating cutting-edge technologies to attain the required surgical precision and limit user radiation exposure, thereby addressing the limitations of manual surgical methods. We propose an IGSS workflow and system architecture employing a hybrid-layered approach, combining modular and integrated system architectures in distinctive layers to develop an affordable system for seamless integration, scalability, and reconfigurability. We developed and integrated the system and extensively tested it on phantoms and cadavers. The proposed system's accuracy using navigation guidance is 1.020 mm, and robot assistance is 1.11 mm on phantoms. Observing a similar performance in cadaveric validation where 84% of screw placements were grade A, 10% were grade B using navigation guidance, 90% were grade A, and 10% were grade B using robot assistance as per the Gertzbein-Robbins scale, proving its efficacy for an IGSS. The evaluated performance is adequate for an IGSS and at par with the existing systems in literature and those commercially available. The user radiation is lower than in the literature, given that the system requires only an average of 3 C-Arm images per pedicle screw placement and verification

accuracy, architecture, module, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/SII58957.2024.10417647

2406.04644

Country:

Asia > India > Tamil Nadu > Chennai (0.04)
Europe > Italy (0.04)
Asia > Vietnam > Quảng Ninh Province > Hạ Long (0.04)

Genre: Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Surgery (1.00)
Health & Medicine > Therapeutic Area > Orthopedics/Orthopedic Surgery (0.72)
Health & Medicine > Therapeutic Area > Musculoskeletal (0.72)
Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

MedPromptExtract (Medical Data Extraction Tool): Anonymization and Hi-fidelity Automated data extraction using NLP and prompt engineering

Srivastava, Roomani, Prasad, Suraj, Bhat, Lipika, Deshpande, Sarvesh, Das, Barnali, Jadhav, Kshitij

arXiv.org Artificial IntelligenceJun-6-2024

A major roadblock in the seamless digitization of medical records remains the lack of interoperability of existing records. Extracting relevant medical information required for further treatment planning or even research is a time consuming labour intensive task involving expenditure of valuable time of doctors. In this demo paper we present, MedPromptExtract an automated tool using a combination of semi supervised learning, large language models, natural language processing and prompt engineering to convert unstructured medical records to structured data which is amenable for further analysis.

extraction, information, medpromptextract, (13 more...)

arXiv.org Artificial Intelligence

2405.02664

Country:

Asia > India > Maharashtra > Mumbai (0.06)
North America > United States > Idaho > Ada County > Boise (0.05)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Health Care Technology > Medical Record (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Data Science > Data Mining > Text Mining (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Classification of executive functioning performance post-longitudinal tDCS using functional connectivity and machine learning methods

Rao, Akash K, Menon, Vishnu K, Uttrani, Shashank, Dixit, Ayushman, Verma, Dipanshu, Dutt, Varun

arXiv.org Artificial IntelligenceJan-31-2024

Executive functioning is a cognitive process that enables humans to plan, organize, and regulate their behavior in a goal-directed manner. Understanding and classifying the changes in executive functioning after longitudinal interventions (like transcranial direct current stimulation (tDCS)) has not been explored in the literature. This study employs functional connectivity and machine learning algorithms to classify executive functioning performance post-tDCS. Fifty subjects were divided into experimental and placebo control groups. EEG data was collected while subjects performed an executive functioning task on Day 1. The experimental group received tDCS during task training from Day 2 to Day 8, while the control group received sham tDCS. On Day 10, subjects repeated the tasks specified on Day 1. Different functional connectivity metrics were extracted from EEG data and eventually used for classifying executive functioning performance using different machine learning algorithms. Results revealed that a novel combination of partial directed coherence and multi-layer perceptron (along with recursive feature elimination) resulted in a high classification accuracy of 95.44%. We discuss the implications of our results in developing real-time neurofeedback systems for assessing and enhancing executive functioning performance post-tDCS administration.

accuracy, algorithm, functional connectivity metric, (11 more...)

arXiv.org Artificial Intelligence

2401.177

Country:

North America > United States > California > San Francisco County > San Francisco (0.04)
Europe > Finland > Uusimaa > Helsinki (0.04)
Asia > India > Himachal Pradesh (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (0.94)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.55)

Add feedback

Effect of dimensionality change on the bias of word embeddings

Rai, Rohit Raj, Awekar, Amit

arXiv.org Artificial IntelligenceDec-28-2023

Word embedding methods (WEMs) are extensively used for representing text data. The dimensionality of these embeddings varies across various tasks and implementations. The effect of dimensionality change on the accuracy of the downstream task is a well-explored question. However, how the dimensionality change affects the bias of word embeddings needs to be investigated. Using the English Wikipedia corpus, we study this effect for two static (Word2Vec and fastText) and two context-sensitive (ElMo and BERT) WEMs. We have two observations. First, there is a significant variation in the bias of word embeddings with the dimensionality change. Second, there is no uniformity in how the dimensionality change affects the bias of word embeddings. These factors should be considered while selecting the dimensionality of word embeddings.

dimensionality, dimensionality change, fasttext, (16 more...)

arXiv.org Artificial Intelligence

2312.17292

Country:

Asia > India > West Bengal > Kolkata (0.06)
Asia > India > Assam > Guwahati (0.05)
Oceania > Australia > Victoria > Melbourne (0.05)
(2 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Software (0.95)

Add feedback